变异推理(VI)的核心原理是将计算复杂后概率密度计算的统计推断问题转换为可拖动的优化问题。该属性使VI比几种基于采样的技术更快。但是,传统的VI算法无法扩展到大型数据集,并且无法轻易推断出越野数据点,而无需重新运行优化过程。该领域的最新发展,例如随机,黑框和摊销VI,已帮助解决了这些问题。如今,生成的建模任务广泛利用摊销VI来实现其效率和可扩展性,因为它利用参数化函数来学习近似的后验密度参数。在本文中,我们回顾了各种VI技术的数学基础,以构成理解摊销VI的基础。此外,我们还概述了最近解决摊销VI问题的趋势,例如摊销差距,泛化问题,不一致的表示学习和后验崩溃。最后,我们分析了改善VI优化的替代差异度量。
translated by 谷歌翻译
在SARS-COV-2大流行期间,戴着面膜穿着成为防止传播和收缩病毒的有效工具。监测人口中面膜速率的能力将用于确定对病毒的公共卫生策略。然而,用于检测面罩的人工智能技术尚未在现实​​生活中以大规模部署在公共场合的大规模中。在本文中,我们介绍了由两个单独的模块组成的两步​​面掩模检测方法:1)面部检测和对准,2)面掩模分类。这种方法使我们能够尝试不同的面部检测和面罩分类模块的组合。更具体地说,我们尝试使用金字塔和视网膜作为面部探测器,同时保持面罩分类模块的轻质骨干。此外,我们还提供了Aizoo数据集的测试集的重叠注释,在那里我们纠正了某些面部图像的错误标签。 Aizoo和Moxa 3K数据集的评估结果表明,所提出的面罩检测管道超越了最先进的方法。所提出的管道在AIZOO数据集的重叠测试组上也产生了比原始测试集更高的映射。由于我们使用野外的面部图像培训了所提出的模型,我们可以成功部署我们的模型来使用公共CCTV图像监控戴掩模速率。
translated by 谷歌翻译
本文提出了一种名为定位变压器(LOTR)的新型变压器的面部地标定位网络。所提出的框架是一种直接坐标回归方法,利用变压器网络以更好地利用特征图中的空间信息。 LOTR模型由三个主要模块组成:1)将输入图像转换为特征图的视觉骨干板,2)改进Visual Backone的特征表示,以及3)直接预测的地标预测头部的变压器模块来自变压器的代表的地标坐标。给定裁剪和对齐的面部图像,所提出的LOTR可以训练结束到底,而无需任何后处理步骤。本文还介绍了光滑翼损失功能,它解决了机翼损耗的梯度不连续性,导致比L1,L2和机翼损耗等标准损耗功能更好地收敛。通过106点面部地标定位的第一个大挑战提供的JD地标数据集的实验结果表明了LOTR在排行榜上的现有方法和最近基于热爱的方法的优势。在WFLW DataSet上,所提出的Lotr框架与若干最先进的方法相比,展示了有希望的结果。此外,我们在使用我们提出的LOTRS面向对齐时,我们报告了最先进的面部识别性能的提高。
translated by 谷歌翻译
近似复杂的概率密度是现代统计中的核心问题。在本文中,我们介绍了变分推理(VI)的概念,这是一种机器学习中的流行方法,该方法使用优化技术来估计复杂的概率密度。此属性允许VI汇聚速度比经典方法更快,例如Markov Chain Monte Carlo采样。概念上,VI通过选择一个概率密度函数,然后找到最接近实际概率密度的家庭 - 通常使用Kullback-Leibler(KL)发散作为优化度量。我们介绍了缩窄的证据,以促进近似的概率密度,我们审查了平均场变分推理背后的想法。最后,我们讨论VI对变分式自动编码器(VAE)和VAE-生成的对抗网络(VAE-GAN)的应用。用本文,我们的目标是解释VI的概念,并通过这种方法协助协助。
translated by 谷歌翻译
In molecular research, simulation \& design of molecules are key areas with significant implications for drug development, material science, and other fields. Current classical computational power falls inadequate to simulate any more than small molecules, let alone protein chains on hundreds of peptide. Therefore these experiment are done physically in wet-lab, but it takes a lot of time \& not possible to examine every molecule due to the size of the search area, tens of billions of dollars are spent every year in these research experiments. Molecule simulation \& design has lately advanced significantly by machine learning models, A fresh perspective on the issue of chemical synthesis is provided by deep generative models for graph-structured data. By optimising differentiable models that produce molecular graphs directly, it is feasible to avoid costly search techniques in the discrete and huge space of chemical structures. But these models also suffer from computational limitations when dimensions become huge and consume huge amount of resources. Quantum Generative machine learning in recent years have shown some empirical results promising significant advantages over classical counterparts.
translated by 谷歌翻译
Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet effective way to train large-scale vision-language models. CLIP demonstrates impressive zero-shot classification and retrieval on diverse downstream tasks. However, to leverage its full potential, fine-tuning still appears to be necessary. Fine-tuning the entire CLIP model can be resource-intensive and unstable. Moreover, recent methods that aim to circumvent this need for fine-tuning still require access to images from the target distribution. In this paper, we pursue a different approach and explore the regime of training-free "name-only transfer" in which the only knowledge we possess about the downstream task comprises the names of downstream target categories. We propose a novel method, SuS-X, consisting of two key building blocks -- SuS and TIP-X, that requires neither intensive fine-tuning nor costly labelled data. SuS-X achieves state-of-the-art zero-shot classification results on 19 benchmark datasets. We further show the utility of TIP-X in the training-free few-shot setting, where we again achieve state-of-the-art results over strong training-free baselines. Code is available at https://github.com/vishaal27/SuS-X.
translated by 谷歌翻译
Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move. This information is useful to make inferences about 3D shape, physical properties and object interactions. While the problem of tracking arbitrary physical points on surfaces over longer video clips has received some attention, no dataset or benchmark for evaluation existed, until now. In this paper, we first formalize the problem, naming it tracking any point (TAP). We introduce a companion benchmark, TAP-Vid, which is composed of both real-world videos with accurate human annotations of point tracks, and synthetic videos with perfect ground-truth point tracks. Central to the construction of our benchmark is a novel semi-automatic crowdsourced pipeline which uses optical flow estimates to compensate for easier, short-term motion like camera shake, allowing annotators to focus on harder sections of video. We validate our pipeline on synthetic data and propose a simple end-to-end point tracking model TAP-Net, showing that it outperforms all prior methods on our benchmark when trained on synthetic data.
translated by 谷歌翻译
因果和归因研究对于地球科学发现至关重要,对于为气候,生态和水政策提供信息至关重要。但是,当前的方法需要与科学和利益相关者挑战的复杂性以及数据可用性以及数据驱动方法的充分性相结合。除非通过物理学进行仔细的通知,否则它们会冒着将相关性与因果关系相关或因估计不准确而淹没的风险。鉴于自然实验,对照试验,干预措施和反事实检查通常是不切实际的,因此已经开发了信息理论方法,并在地球科学中不断完善。在这里,我们表明,基于转移熵的因果图最近在具有备受瞩目的发现的地球科学中变得流行,即使增强具有统计学意义,也可能是虚假的。我们开发了一种基于子样本的合奏方法,用于鲁棒性因果分析。模拟数据以及气候和生态水文中的观察表明,这种方法的鲁棒性和一致性。
translated by 谷歌翻译
最近的研究提出了一系列针对深度任务模型的专业优化算法。通常声称这些多任务优化(MTO)方法产生的解决方案优于仅通过优化任务损失的加权平均值而获得的解决方案。在本文中,我们对各种语言和视觉任务进行大规模实验,以检查这些主张的经验有效性。我们表明,尽管这些算法的设计和计算复杂性增加了,但MTO方法并未产生超出传统优化方法可实现的性能的任何改进。我们强调了替代策略,这些策略始终如一地提高性能概况,并指出可能导致次优效果的常见训练陷阱。最后,我们概述了可靠地评估MTO算法的性能并讨论潜在解决方案的挑战。
translated by 谷歌翻译
临时团队合作的进步有可能创建在现实世界应用程序中合作的代理商。但是,部署在现实世界中的代理人容易受到颠覆它们的意图的对手。在临时团队工作中,几乎没有研究对手的存在。我们解释了扩展临时团队工作以包括对手的存在的重要性,并澄清了为什么这个问题很困难。然后,我们提出了一些在临时团队合作中的新研究机会的指示,这会导致更强大的多代理网络物理基础设施系统。
translated by 谷歌翻译